Search CORE

14 research outputs found

A Simulation Suite for Lattice-Boltzmann based Real-Time CFD Applications Exploiting Multi-Level Parallelism on Modern Multi- and Many-Core Architectures

Author: Geveler Markus
Göddeke Dominik
Mallach Sven
Ribbrock Dirk
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

We present a software approach to hardware-oriented numerics which builds upon an augmented, previously published open-source set of libraries facilitating portable code development and optimisation on a wide range of modern computer architectures. In order to maximise eficiency, we exploit all levels of arallelism, including vectorisation within CPU cores, the Cell BE and GPUs, shared memory thread-level parallelism between cores, and parallelism between heterogeneous distributed memory resources in clusters. To evaluate and validate our approach, we implement a collection of modular building blocks for the easy and fast assembly and development of CFD applications based on the shallow water equations: We combine the Lattice-Boltzmann method with i-uid-structure interaction techniques in order to achieve real-time simulations targeting interactive virtual environments. Our results demonstrate that recent multi-core CPUs outperform the Cell BE, while GPUs are significantly faster than conventional multi-threaded SSE code. In addition, we verify good scalability properties of our application on small clusters

computer science publication server

A Simulation Suite for Lattice-Boltzmann based Real-Time CFD Applications Exploiting Multi-Level Parallelism on Modern Multi- and Many-Core Architectures

Author: Geveler Markus
Göddeke Dominik
Mallach Sven
Ribbrock Dirk
Publication venue: 'Elsevier BV'
Publication date: 01/01/2011
Field of study

Kölner UniversitätsPublikationsServer

Very fast FEM Poisson solvers on lower precision accelerator hardware

Author: Ribbrock Dirk
Ruda Dustin
Turek Stefan
Zajac Peter
Publication venue
Publication date: 24/11/2022
Field of study

Graphics cards that are equipped with Tensor Core units designed for AI applications, for example the NVIDIA Ampere A100, promise very high peak rates concerning their computing power (156 TFLOP/s in single and 312 TFLOP/s in half precision in the case of the A100). This is only achieved when performing arithmetically intensive operations such as dense matrix multiplications in the aforementioned lower precision, which is an obstacle when trying to use this hardware for solving linear systems arising from PDEs discretized with the finite element method. In previous works, we delivered a proof of concept that the predecessor of the A100, the V100 and its Tensor Cores, can be exploited to a great extent when solving Poisson's equation on the unit square if a hardware-oriented direct solver based on prehandling via hierarchical finite elements and a Schur complement approach is used. In this work, using numerical results on an A100 graphics card, we show that the method also achieves a very high performance if Poisson's equation, which is discretized by linear finite elements, is solved on a more complex domain corresponding to a flow around a square configuration

Scipedia

HONEI: A collection of libraries for numerical computations targeting multiple processor architectures.

Author: Geveler Markus
Gutwenger Carsten
Göddeke Dominik
Mallach Sven
Ribbrock Dirk
van Dyk Danny
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We present HONEI, an open-source collection of libraries offering a hardware oriented approach to numerical calculations. HONEI abstracts the hardware, and applications written on top of HONEI can be executed on a wide range of computer architectures such as CPUs, GPUs and the Cell processor. We demonstrate the flexibility and performance of our approach with two test applications, a Finite Element multigrid solver for the Poisson problem and a robust and fast simulation of shallow water waves. By linking against HONEI's libraries, we achieve a two-fold speedup over straight forward C++ code using HONEI's SSE backend, and additional 3--4 and 4--16 times faster execution on the Cell and a GPU. A second important aspect of our approach is that the full performance capabilities of the hardware under consideration can be exploited by adding optimised application-specific operations to the HONEI libraries. HONEI provides all necessary infrastructure for development and evaluation of such kernels, significantly simplifying their development

arXiv.org e-Print Archive

computer science publication server

Kölner UniversitätsPublikationsServer

The Concept of Prehandling as Direct Preconditioning for Poisson-like Problems

Author: Ribbrock Dirk
Ruda Dustin
Turek Stefan
Zajac Peter
Publication venue
Publication date
Field of study

To benefit from current trends in HPC hardware, such as increasing avail-ability of low precision hardware, we present the concept of prehandling as a direct way of preconditioning and the hierarchical finite element method which is exceptionally well-suited to apply prehandling to Poisson-like problems, at least in 1D and 2D. Such problems are known to cause ill-conditioned stiffness matrices and therefore high computational errors due to round-off. We show by means of numerical results that by prehandling via the hierarchical finite element method the condition number can be significantly reduced (while advantageous properties are preserved) which enables us to obtain sufficiently accurate solutions to Poisson-like problems even if lower computing precision (i.e. single or half precision format) is used

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

Basic Machine Learning Approaches for the Acceleration of PDE Simulations and Realization in the FEAT3 Software

Author: Geveler Markus
Ribbrock Dirk
Ruelmann Hannes
Turek Stefan
Zajac Peter
Publication venue
Publication date
Field of study

In this paper we present a holistic software approach based on the FEAT3 software for solving multidimensional PDEs with the Finite Element Method that is built for a maximum of performance, scalability, maintainability and extensibilty. We introduce basic paradigms how modern computational hardware architectures such as GPUs are exploited in a numerically scalable fashion. We show, how the framework is extended to make even the most recent advances on the hardware market accessible to the framework, exemplified by the ubiquitous trend to customize chips for Machine Learning. We can demonstrate that for a numerically challenging model problem, artificial neural networks can be used while preserving a classical simulation solution pipeline through the incorporation of a neural network preconditioner in the linear solve

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung